Gas Data

The gas data represents information about various gas stations, including their location, services offered, population of compromised individuals (POC), and other relevant details. Here’s an explanation of the columns in the dataset:

  • X: Row identifier.
  • site_row_id: Identifier for the site.
  • STATE: State where the gas station is located.
  • county: County where the gas station is located.
  • ADDRESS: Street address of the gas station.
  • CITY: City where the gas station is located.
  • ycoord: Latitude coordinate of the gas station.
  • xcoord: Longitude coordinate of the gas station.
  • SITE_DESCRIPTION: Description of the gas station site.
  • service_or_fuel: Indicates whether the station provides service, fuel, or both.
  • diesel: Indicates if diesel fuel is available at the station.
  • twentyfour_hour_flag: Indicates if the station operates 24 hours.
  • car_wash: Indicates if the station has a car wash service.
  • truckstop_flag: Indicates if the station is a truck stop.
  • description: Additional description of the gas station.
  • PUMP_TECH: Pump technology used at the gas station.
  • POC: Population of compromised individuals.
  • HIFCA: High Intensity Financial Crime Area.
  • ZIPnew: ZIP code of the gas station.
  • POCAGE: Age distribution of the population of compromised individuals.
  • POCGAP: Age gap distribution of the population of compromised individuals.
  • ZIPPOC: ZIP code of the population of compromised individuals.
  • HFG: Human Factors Geometry.
  • MSA: Metropolitan Statistical Area.
  • dist.to.poc: Distance to the population of compromised individuals.
  • cate.poc.density: Categorized population of compromised individuals density.
  • cate.poc.age: Categorized population of compromised individuals age.
  • cate.poc.age.20: Categorized population of compromised individuals age group 20.
  • cate.poc.intensity: Categorized population of compromised individuals intensity.
  • cate.poc.intensity.tot: Total categorized population of compromised individuals intensity.
  • MSA_POC: Metropolitan Statistical Area population of compromised individuals.
  • MSA_POC.1: Another column indicating Metropolitan Statistical Area population of compromised individuals.

This dataset contains detailed information about gas stations and the population they serve, including geographical coordinates, services offered, and demographic characteristics of the surrounding population.

gas <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/POC.csv")

Simple Leaflet Map

In the map below, each point represents the precise location of a gas station, with latitude and longitude coordinates derived from the “xcoord” and “ycoord” columns in the gas dataset, respectively. Hovering over each point reveals essential details such as the state, county, address, and ZIP code associated with that particular gas station. Explore the interactive map to visualize the distribution of gas stations across different regions:

 gas_samp <- gas %>% sample_n(500)

# Create a leaflet map
gas_map <- leaflet(data = gas_samp) %>%
  addTiles() %>%
  addMarkers(
    lng = ~xcoord,
    lat = ~ycoord,
    popup = ~paste("State: ", STATE, "<br>",
                   "County: ", county, "<br>",
                   "Address: ", ADDRESS, "<br>",
                   "Zip Code: ", ZIPnew)
  )
# Display the map
gas_map

Leaflet Map

Below, we enhance the leaflet map by specifying the radius and color of the markers. Each marker now appears as a circle with a fixed radius of 5 and a color set to blue. Similar to the previous map, each circle represents a gas station, and hovering over a point reveals essential information such as the state, county, address, and ZIP code associated with that specific gas station. Explore the interactive map to visualize the distribution of gas stations with this enhanced visual representation.

# Create a leaflet map
gas_map2 <- leaflet(data = gas_samp) %>%
  addTiles() %>%
  setView(lng = mean(gas_samp$xcoord), 
          lat = mean(gas_samp$ycoord), 
          zoom = 13) %>%
  addProviderTiles("Esri.WorldGrayCanvas") %>%
  addCircleMarkers(
    ~xcoord, 
    ~ycoord,
    color = "blue",  # Adjust color as needed
    radius = 5,  # Adjust radius as needed
    stroke = FALSE, 
    fillOpacity = 0.4,
    label = ~paste("State: ", STATE,
                   "County: ", county,
                   "Address: ", ADDRESS,
                   "Zip Code: ", ZIPnew)
  ) %>%
  addLegend(position = "bottomright", 
            colors = "blue",  # Adjust color as needed
            labels = "Gas Station",
            title = "Gas Stations",
            opacity = 0.4)

# Display the map
gas_map2

Best Map

In this iteration, we introduce a more sophisticated leaflet map. The radius of each point on the map is determined by the number of Points of Compromise (POCs) in the gas station’s ZIP code. Therefore, larger circles represent ZIP codes with more POCs, providing a visual indicator of potential risk areas.

Additionally, the color of each point corresponds to the type of services offered by the gas station: “Fuel”, “Service Only”, or “Both”. However, since the dataset does not include any gas stations that offer “Service Only”, only the categories “Fuel” and “Both” will be displayed on the map.

As with the previous maps, hovering over a point reveals detailed information such as the state, county, address, and ZIP code associated with the respective gas station. Explore the map to gain insights into the distribution of gas stations and their associated services.

# Create a color palette based on service_or_fuel values
service_palette <- colorFactor(palette = "Set1", domain = gas_samp$service_or_fuel)

# Create the leaflet map
gas_map3 <- leaflet(data = gas_samp) %>%
  addTiles() %>%
  addProviderTiles("Esri.WorldGrayCanvas") %>%
  addCircleMarkers(
    ~xcoord, 
    ~ycoord,
    color = ~service_palette(service_or_fuel),  # Use colorFactor
    radius = gas_samp$ZIPPOC * 10,  # Adjust radius as needed
    stroke = FALSE, 
    fillOpacity = 0.4,
    label = ~paste("State: ", STATE, "<br>",
                   "County: ", county, "<br>",
                   "Address: ", ADDRESS, "<br>",
                   "Zip Code: ", ZIPnew)
  ) %>%
  addLegend(position = "bottomright", 
            colors = service_palette(unique(gas_samp$service_or_fuel)),  # Use unique service_or_fuel values
            labels = unique(gas_samp$service_or_fuel),
            title = "Gas Stations",
            opacity = 0.4)

# Display the map
gas_map3

Philly Crime Data

The Philadelphia crime dataset contains information on various incidents, including details such as demographic characteristics, incident severity, location, and other relevant attributes. Here’s an explanation of the dataset columns:

  • dc_key: A unique identifier for each incident.
  • race: Specifies the racial background of the individuals involved, categorized as Black (Non-Hispanic), Hispanic (Black or White), and so on.
  • sex: Indicates the gender of the individuals involved, classified as Male or Female.
  • fatal: Indicates whether the incident resulted in a fatality (Fatal) or not (Nonfatal).
  • date: Records the date and time when the incident occurred.
  • has_court_case: Specifies whether the incident is associated with a court case (Yes/No).
  • age: Represents the age of the individuals involved in the incident.
  • street_name: Denotes the name of the street where the incident took place.
  • block_number: Indicates the block number related to the incident’s location.
  • zip_code: Provides the ZIP code of the incident location.
  • council_district: Identifies the council district corresponding to the incident location.
  • police_district: Identifies the police district corresponding to the incident location.
  • neighborhood: Specifies the neighborhood where the incident occurred.
  • house_district: Identifies the house district associated with the incident location.
  • senate_district: Identifies the senate district associated with the incident location.
  • school_catchment: Specifies the school catchment area associated with the incident location.
  • lng: Represents the longitude coordinate of the incident location.
  • lat: Represents the latitude coordinate of the incident location.

This dataset provides valuable insights into the demographics of individuals involved in various incidents, the nature and severity of the incidents, and their spatial distribution across different neighborhoods and districts within Philadelphia. Analyzing this data can help identify patterns, trends, and areas of concern related to crime and public safety in the city. We’re narrowing down our dataset to focus solely on the data from 2023. Since there’s no specific variable denoting the year, we’ll derive it from the existing ‘date’ variable. After creating the ‘Year’ variable, we can then filter the data to include only observations from 2023. Consequently, our updated dataset now comprises 1666 observations and 19 variables, including the newly added ‘Year’.

philly <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/PhillyCrimeSince2015.csv")

# Convert date variable to date format
philly$date <- as.Date(philly$date, format = "%m/%d/%Y %H:%M")

# Extract year from date variable
philly$year <- format(philly$date, "%Y")

philly <- subset(philly, year=="2023")

Leaflet Map

Now, let’s visualize fatal versus non-fatal crimes that occurred in Philadelphia in the year 2023 on a leaflet map. We’ll once again utilize the “color” function to differentiate between the two types of crimes. Each category, “Fatal” or “Nonfatal,” will be assigned a distinct color, providing a visual representation of the crime type. The map follows a similar format to the ones above, with each circle point denoting a specific crime incident. Hovering over a point will reveal details such as the “Neighborhood,” “Date,” “Race,” “Sex,” “Age,” and “Street” associated with that particular crime. Upon visual inspection of the map, it appears that there is a notable disparity between the number of non-fatal crimes and fatal crimes. However, to confirm this observation, further analysis would be necessary.

library(leaflet)
library(dplyr)
# Create color palette for fatal and non-fatal crimes
fatal <- "red"
non_fatal <- "blue"

# Create leaflet map
map <- leaflet(philly) %>%
  addTiles() %>%
  addCircleMarkers(
    ~lng, ~lat,
    color = ifelse(philly$fatal == "Fatal", fatal, non_fatal),
    radius = 5,
    label = ~paste("Neighborhood: ", neighborhood,
                   "Date: ", date,
                   "Race: ", race,
                   "Sex: ", sex,
                   "Age: ", age,
                   "Street: ", street_name),
    labelOptions = labelOptions(
      direction = "auto"
    )
  ) %>%
  addLegend(
    position = "bottomright",
    colors = c(fatal, non_fatal),
    labels = c("Fatal", "Non-Fatal"),
    title = "Crime Type"
  ) %>%
  addScaleBar() %>%
  addControl(
    html = "<h4>Philadelphia Crime Locations (2015-2024)</h4>",
    position = "topright"
  )

# Display the map
map

Better Leaflet Map

Now, let’s create an enhanced leaflet map to visualize fatal versus non-fatal crimes that occurred in Philadelphia. We’ll utilize the “color” function once again, with colors representing whether a crime was labeled as “Fatal” or “Nonfatal”. Each category will be uniquely colored, offering clear visual identification of the crime type. We’ll represent each crime location with a circle marker on the map. Hovering over a point will display detailed information including “Object ID”, “Year”, “Race”, “Sex”, “Age”, “Wound”, and “Location” for each crime incident.

# Load required libraries
library(leaflet)
library(sf)

# Suppress messages while reading GeoJSON files
options(warn=-1)

# Read the data without printing messages
philly <- st_read("https://pengdsci.github.io/STA553VIZ/w08/PhillyShootings.geojson", quiet = TRUE)
phillyNeighbor <- st_read("https://pengdsci.github.io/STA553VIZ/w08/Neighborhoods_Philadelphia.geojson", quiet = TRUE)

# Reset warning settings
options(warn=0)

# Convert 'philly' data to sf object
philly_sf <- st_as_sf(philly, coords = c("point_x", "point_y"), crs = 4326)

# Define color palette for fatal and non-fatal crimes
fatal_color <- "red"
non_fatal_color <- "gold"

# Create leaflet map
map <- leaflet() %>%
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
  addPolygons(data = phillyNeighbor,
              color = 'skyblue',
              weight = 1)  %>%
  addCircleMarkers(data = philly_sf,
                   ~point_x, ~point_y,
                   color = ifelse(philly$fatal == 1, fatal_color, non_fatal_color),
                   radius = 5,
                   popup = ~paste("Object ID: ", objectid,
                                  "<br>Year: ", year,
                                  "<br>Race: ", race,
                                  "<br>Sex: ", sex,
                                  "<br>Age: ", age,
                                  "<br>Wound: ", wound,
                                  "<br>Location: ", location),
                   labelOptions = labelOptions(
                     direction = "auto"
                   )
  ) %>%
  addLegend(
    position = "bottomright",
    colors = c("red", "gold"),
    labels = c("Fatal", "Non-Fatal"),
    title = "Crime Type"
  ) %>%
  addScaleBar() %>%
  addControl(
    html = "<h4>Philadelphia Crime Locations (2015-2024)</h4>",
    position = "topright"
  ) %>%
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
  setView(lng = -75.1527, lat = 39.9707, zoom = 11)

# Display the map
map



U.S. Presidential Election Data (2000-2024)

Our initial dataset, named “election”, encompasses Presidential election outcomes spanning the years 2000, 2004, 2008, 2012, 2016, and 2020. With 72,617 observations and 12 variables, it provides comprehensive insights into each state’s and county’s election results, detailing the winning candidate in each county, along with the total votes received by each candidate.

Prior to analysis, some data cleaning was imperative, particularly concerning the county FIPS codes—a unique 5-digit identifier assigned to every county in the United States. Initially, certain codes erroneously contained only 4 digits, notably when a “0” preceded the first digit. For instance, Autauga County, Alabama’s FIPS code “01001” was recorded as “1001” in the dataset. This discrepancy was rectified using the “TEXT” function in Excel, applied before importing the data into the “election” set.

Utilizing the “election” dataset, our objective is to split the data into county-level and state-level subsets. Both subsets include a new variable named “party_percentage,” calculated to ascertain the percentage of voters favoring the winning party within their respective state or county. The “county_data” subset provides election results categorized by county, while the “state_data” subset presents election outcomes aggregated by state. Furthermore, both subsets retain solely the winning party’s information for analysis.

# Load the required library
library(dplyr)

# Read the data
election <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/PresidentialElection2000To2020.csv")

# County-level Data
county_data <- election %>%
  group_by(year, state, county_name) %>%
  mutate(party_percentage = candidatevotes / sum(candidatevotes) * 100) %>%
  filter(party_percentage == max(party_percentage)) %>%
  select(year, state, county_fips, party, candidate, candidatevotes, party_percentage)

# State-level Data
state_data <- election %>%
  group_by(year, state) %>%
  mutate(party_percentage = candidatevotes / sum(candidatevotes) * 100) %>%
  filter(party_percentage == max(party_percentage)) %>%
  select(year, state, party, candidate, candidatevotes, party_percentage)

# Save county-level data to a new CSV file
write.csv(county_data, file = "county_level_data.csv", row.names = FALSE)

# Save state-level data to a new CSV file
write.csv(state_data, file = "state_level_data.csv", row.names = FALSE)

Choropleth Map

Now that we’ve split the dataset into “county_data,” focusing solely on election results (specifically the winning party) at the county level, we can leverage Tableau, an interactive data visualization tool, to craft a Choropleth Map. This map will display presidential election outcomes at the county level. Different colors are assigned to represent the major political parties (Democrat & Republican), and each county’s shading reflects the winning political party in a specific election year. The interactive map includes a filter to alter the displayed year(s). Additionally, hover text appears when hovering over a specific county on the map, providing information such as “year,” “state,” “party,” “candidatevotes,” and “party_percentage” for the respective county.


